Task-based MT Evaluation: From Who/When/Where Extraction to Event Understanding
نویسندگان
چکیده
Task-based machine translation (MT) evaluation asks, how well do people perform text-handling tasks given MT output? This method of evaluation yields an extrinsic assessment of an MT engine, in terms of users’ task performance on MT output. While this method is time-consuming, its key advantage is that MT users and stakeholders understand how to interpret the assessment results. Prior experiments showed that subjects can extract individual who-, when-, and where-type elements of information from MT output passages that were not especially fluent. This paper presents the results of a pilot study to assess a slightly more complex task: when given such wh-items already identified in an MT output passage, how well can subjects properly select from and place these items into wh-typed slots to complete a sentence-template about the passage’s event? The results of the pilot with nearly sixty subjects, while only preliminary, indicate that this task was extremely challenging: given six test templates to complete, half of the subjects had no completely correct templates and 42% had exactly one completely correct template. The provisional interpretation of this pilot study is that event-based template completion defines a task ceiling, against which to evaluate future improvements on MT engines.
منابع مشابه
Task-based MT evaluation
Task-based machine translation (MT) evaluation asks, how well do people perform text-handling tasks given MT output? This method of evaluation yields an extrinsic assessment of an MT engine, in terms of users’ task performance on MT output. While this method is time-consuming, its key advantage is that MT users and stakeholders understand how to interpret the assessment results. Prior experimen...
متن کاملGround Truth, Reference Truth & “Omniscient Truth” -- Parallel Phrases in Parallel Texts for MT Evaluation
Recently introduced automated methods of evaluating machine translation (MT) systems require the construction of parallel corpora of source language (SL) texts with human reference translations in the target language (TL). We present a novel method of exploiting and augmenting these resources for task-based MT evaluation, assessing how accurately people can extract Who, When, and Where elements...
متن کاملA Statistical Analysis of Automated MT Evaluation Metrics for Assessments in Task-Based MT Evaluation
This paper applies nonparametric statistical techniques to Machine Translation (MT) Evaluation using data from a large scale task-based study. In particular, the relationship between human task performance on an information extraction task with translated documents and well-known automated translation evaluation metric scores for those documents is studied. Findings from a correlation analysis ...
متن کاملA Task-Oriented Evaluation Metric for Machine Translation
Evaluation remains an open and fundamental issue for machine translation (MT). The inherent subjectivity of any judgment about the quality of translation, whether human or machine, and the diversity of end uses and users of translated material, contribute to the difficulty of establishing relevant and efficient evaluation methods. The US Federal Intelligent Document Understanding Laboratory (FI...
متن کاملAn Investigation of the Relationship Between Automated Machine Translation Evaluation Metrics and User Performance on an Information Extraction Task
Title of dissertation: AN INVESTIGATION OF THE RELATIONSHIP BETWEEN AUTOMATED MACHINE TRANSLATION EVALUATION METRICS AND USER PERFORMANCE ON AN INFORMATION EXTRACTION TASK Calandra Rilette Tate, Doctor of Philosophy, 2007 Dissertation directed by: Professor Eric V. Slud Department of Mathematics & co-directed by: Professor Bonnie J. Dorr Department of Computer Science This dissertation applies ...
متن کامل